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Abstract 

Massive MIMO has been regarded as one of the key technologies for 5G wireless networks, as it can significantly 
improve both the spectral efficiency and energy efficiency. The availability of high-dimensional channel side informa¬ 
tion (CSI) is critical for its promised performance gains, but the overhead of acquiring CSI may potentially deplete 
the available radio resources. Fortunately, it has recently been discovered that harnessing various sparsity structures 
in massive MIMO channels can lead to significant overhead reduction, and thus improve the system performance. 
This paper presents and discusses the use of sparsity-inspired CSI acquisition techniques for massive MIMO, as 
well as the underlying mathematical theory. Sparsity-inspired approaches for both frequency-division duplexing and 
time-division duplexing massive MIMO systems will be examined and compared from an overall system perspective, 
including the design trade-offs between the two duplexing modes, computational complexity of acquisition algorithms, 
and applicability of sparsity structures. Meanwhile, some future prospects for research on high-dimensional CSI 
acquisition to meet practical demands will be identified. 


Index Terms 

Massive MIMO, channel estimation, pilot contamination, pilot sequences, sparsity, compressed sensing, t\ min¬ 
imization. 


I. Introduction 

Massive MIMO systems promise to boost spectral efficiency by more than one order of magnitude 12, SI- 
Full benefits of massive MIMO, however, will never come to fruition without the base stations (BSs) having 
adequate channel knowledge, which appears to be an extremely challenging task 0. The challenges posed by 
MIMO channels of very high dimension are confronted in both frequency-division duplexing (FDD) and time- 
division duplexing (TDD) massive MIMO systems. In the FDD mode, both the pilot-aided training overhead and 
the feedback overhead for CSI acquisition grow proportionally with the BS antenna size. However, the proportion 
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Fig. 1: Pilot reuse in multiple cells, (a) FDD downlink training (b) TDD uplink training. 


of radio resources allocated to CSI acquisition is severely restricted by the channel coherence period. The situation 
is made worse in an environment with high user equipment (UE) mobility. 

In view of this, a considerable research effort has been devoted to TDD massive MIMO by exploiting channel 
reciprocity. Although the training overhead for TDD operation becomes proportional to the number of active UEs 
rather than that of BS antennas, the inevitable reuse of the same pilot in neighboring cells can seriously degrade 
the quality of obtained channel knowledge. This is because the channels to UEs in adjacent cells who share the 
same pilot will be collectively acquired by the BS. In other words, the desired channel obtained by the BS will be 
contaminated by interference channels. Once this contaminated channel knowledge is utilized for transmitting or 
receiving data, intercell interference occurs immediately and hence limits the achievable performance. This problem, 
known as pilot contamination, can not be circumvented simply by adding more BS antennas. 

Several attempts have been made to tackle the challenges of acquiring high-dimensional CSI in massive MIMO. 
For instance, in m, open/closed loop training that utilizes temporal and spatial channel statistics is proposed 
to reduce the amount of downlink training overhead. For mitigating pilot contamination, the optimal design of 
precoding matrices aimed at minimizing the square errors caused by pilot reuse has shown its superiority over 
linear precoding 0. Thanks to the recent advances in compressed sensing 0. 0. sparse signal processing has 
attracted much attention in such high-dimensional settings, which has also demonstrated its power in CSI acquisition 
in terms of reconstructing CSI from a limited number of channel measurements. Various sparsity structures exhibited 
by massive MIMO channels have recently been identified, thereby motivating the development of new strategies 
for CSI acquisition. Surprisingly, not only can high training overhead be reduced, but pilot contamination can also 
be resolved by appealing to sparsity-inspired approaches. 

In this paper, we provide a comprehensive overview of the state-of-the-art research on sparsity-inspired approaches 
for high-dimensional CSI acquisition. In Section II, the challenges in FDD and TDD massive MIMO are reviewed in 
detail, including a rarely mentioned issue of FDD pilot contamination. On the basis of different sparsity structures, 
a variety of methods for either achieving overhead reduction or alleviating the effects of pilot reuse are examined 
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and compared in Section III. Finally, concluding remarks are made in Section IV. 

Notations : C: complex number, 5ft: real part, ||-|| : p-norm, (•) : transpose, (•) H : Hermitian transpose, 1^: N x TV 
identity matrix, A f (•, •): normal distribution, E [•]: expectation, 0: zero vector, card (•): cardinality, supp (•): the set of 
indices of non-zero elements, Var (•): variance, max {•}: the maximum element, Vec (•): vectorization, 0 : Kronecker 
product, 0: matrix inequality, (■) . pseudo inverse. 


II. Challenges of High-Dimensional CSI Acquisition 

In massive MIMO systems with high-dimensional channels, CSI acquisition at BSs is a fundamentally challenging 
problem. In FDD massive MIMO, performing this task consumes a considerable amount of radio resources which 
is proportional to the dimension of channels. On the other hand, in TDD-mode operation, it is hard to ensure 
the orthogonality of pilot sequences in the multicell scenario as the number of overall UEs becomes large. As a 
result, the inevitable reuse of correlated pilot sequences in different cells, known as pilot contamination, causes 
capacity-limiting intercell interference. 

To illustrate these difficulties further, we will consider a massive MIMO network consisting of L hexagonal cells. 
In each cell, there is a BS equipped with an M-element linear arrayserving K single-antenna UEs. The channel 
between BS i and UE k in cell j is denoted by the M x 1 vector h ij t k- The BS antenna size is supposed to be 
greatly larger than the number of served UEs. 


A. FDD Massive MIMO 

In the FDD mode, obtaining CSI at BSs is normally performed in two steps. First, each BS sends a downlink train¬ 
ing matrix to its served UEs. Second, each UE estimates the desired channel based on the downlink measurements 
and feeds back acquired CSI through dedicated uplink feedback channels. 

During downlink training, UE k in cell i receives channel measurements 


Yi,k 


= S° L h 


i,i,k 




(1) 


l^i 

where S™ denotes the N x M pilot training matrix used in cell l, is the additive noise, while the first 
term of the right-hand side (RHS) represents the desired channel measurements, and the next term results from 
intercell interference. Even without considering the impact of intercell interference, the required training overhead 
N for conventional least-squares (LS) or minimum mean square error (MMSE) estimators to achieve a reasonable 
performance level still scales linearly with the BS antenna size. By taking intercell interference into account, a 
further increase in training overhead would occur. The explicit expressions of the optimal pilot training matrices 
(TV > M) are provided in (8) for single-cell networks. In J9|, the optimal design of training matrices for multicell 
MIMO-OFDM systems is considered. 


1 [ or simplicity, the assumption of employing linear arrays is made. However, most of the results discussed in this paper can be generalized 
to include the cases of using planar or cylindrical arrays. 
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What makes the situation worse is that typical feedback channels are finite-rate. This implies that only quantized 
versions of channel estimates can be fed back to BSs. If there are predefined codebooks consisting of precoding 
vectors, then the index of the optimal codebook vector is required to be sent back HUD, EE). However, either the 
amount of quantized CSI or the size of codebooks increases in proportion to the number of BS antennas, and it in 
turn makes these two limited feedback techniques impractical in FDD massive MIMO. 

Note that when the same training matrix is repeatedly used in multiple cells, i.e., S° L = • • • = S^ L , this can 
be regarded as pilot contamination in FDD massive MIMO. As a result of such contamination, as shown in Fig. 
|TJa), BS i will acquire the composite channel 1 rather than the desired channel h* j given the feedback 
channel being error-free and the additive noise being ignored. Despite this fact, utilizing this composite CSI to 
form a precoding vector and transmit signals at BS i will not cause serious interference to UEs in the neighboring 
cells. For instance, given that maximum ratio transmission (MRT) precoding is employed, the transmitted signal 
from BS i can be expressed as x, = J^k=i w Tk x i,k where •I'i.k is the signal intended for UE k within the cell, and 
w ffc = (hj l i k ) denotes the MRT precoding vector. During the downlink transmission phase, the received 
interference at UE m in cell j due to BS i is given by 

K L 

Ii,j,m = hjj mXi = 'y ' y ' (2) 

k= 1 1=1 

When the number of BS antennas grows without limit, the channel vectors are asymptotically orthogonal. Thus, the 
channel products A: h, Km approach zero and so does the interference /, j m . In other words, intercell interference 
caused by pilot contamination diminishes asymptotically with increasing BS antenna size. This implies that there 
is no need to mitigate intercell interference by making training matrices distinct from each other in the asymptotic 
regime. Hence, the existing literature rarely addresses the issue of pilot contamination in FDD massive MIMO. 

Note that uplink training in the FDD mode is not considered here. An explanation for this is provided as follows. 
The uplink CSI is mainly utilized for data acquisition in a multiple-access channel, instead of a broadcast channel. 
This means that more advanced signal processing techniques, such as blind multiuser detection, can be applied at 
the BS side. Thus, pilot-aided training may not be the best choice and CSI acquisition is not necessarily separated 
from data acquisition. 

B. TDD Massive MIMO 

Making massive MIMO operate in the TDD mode is a promising way to circumvent the identified difficulties 
in the FDD mode. Owing to channel reciprocity in the TDD mode, the CSI obtained via uplink training can be 
utilized for downlink transmission. More importantly, the cost of uplink training now increases linearly with the 
number of active UEs rather than that of BS antennas. Typically, for obtaining accurate CSI, it requires that each 
UE transmits an orthogonal pilot sequence to its serving BS. However, the number of available orthogonal pilot 
sequences is limited by the ratio of the channel coherence interval to the channel delay spread fl2l . which may 
be small due to the mobility of UEs or adverse physical environments. When the number of overall UEs becomes 
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Fig. 2: The number (K = 3 1) of admissible UEs versus pilot sequence length for the GWBE, WBE, and FOS 
schemes, given a fixed SINR-requirement pattern, that is { 71-4 = 1 / 3 , 7 (z+i)~ 2 Z = 1> 7(2Z+i)~3Z = 3} (from Ifl3l .~) 


large, the situation of using non-orthogonal pilot sequences, known as pilot contamination, inevitably arises. A 
consequence of pilot contamination is intra- and inter-cell interference. 

During the uplink training phase, the received signal at the ?th BS is given by 


Y ul — 


E 

1=1 


Sy-Hi.j + zv 


(3) 


where = [hj^i,..., consists of channel vectors from UEs in the Zth cell to the ith BS, the columns 

of SJ’ 1 ' form a set of r x 1 pilot sequences {&i k}k=i> and Z'f denotes an additive noise matrix. To illustrate the 
case of intercell interference, assume that the same set of orthogonal pilot sequences is reused in each cell, i.e., 
= • • • = S^ L and s ; k s = 0 for k\ ^ & 2 , as shown in Fig. Qlb). Employing the LS estimator yields the 
channel estimate 


H, 


(S'tfs'i 


(S“ 


\ H 


Y u 


= h m + ^h m + 

l^i 

where the rows of H^j are given by h; ,;^ = h»,/,* when ignoring the noise. During downlink transmission, 

using estimates to form the transmit signal x, = X^*=r w i°k x i,k, where w™ = Ef=i ( h u, fc )' are MRT 

precoding vectors, will cause interference 


(STfs'i 


(s^zy 


(4) 
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h,j,m — 

= ||||2 H" ^ ^ ,j,m%i,k (5) 

k^m 

orljij 

to UE to in cell j. Though the second term on the RHS of ((5]) decreases with the increasing BS antenna size, the 
first term, which does not vanish, makes the received signal-to-interference-plus-noise ratio (SINR) at UE m in cell 
j converge to a limit and becomes the performance limiting factor. 

The current investigation into TDD pilot contamination focuses on its impact on the received SINR or the sum 
rate when linear precoders/detectors are applied. However, very little is known about its impact on the system 
equipped with nonlinear precoders/detectors. A recent work lfl3l provides an interesting perspective on the user 
capacity of pilot-contaminated massive MIMO which quantifies the maximum number of admissible UEs given 
their own SINR requirements. As shown in Fig. [2] the user capacity of three schemed of joint pilot design and 
transmit power allocation is fundamentally limited by the length of pilot sequences. For further details about pilot 
contamination in TDD massive MIMO, the study lfl4ll and references therein should be consulted. 

III. Sparsity-Inspired CSI Acquisition 

Despite the challenges imposed by the high dimensionality of channel matrices, a number of research efforts 
have sought to address them and have achieved reasonably efficient CSI acquisition. In particular, sparsity-inspired 
approaches have been proved to be powerful tools, as presented below. 

A. FDD Massive MIMO 

1) The Joint CSI Recovery Method: Authors of 031 proposed a method for low-overhead pilot training in the 
single-cell scenario, taking advantage of channel sparsity. Provided that a uniform linear array with critically spaced 
antennas is employed at the BS, the channel hfc, where indices of BSs are discarded in the single-cell scenario, 
exhibits a sparse representation h)’ in the angular domain, i.e.. 


h fc = Uh£, (6) 

where U is a discrete Fourier transform (DFT) matrix whose columns form an angular basis. The cardinality of 
suppjhj!,) can be reasonably assumed to be greatly less than M because of limited local scattering at the BS 
whose antenna array mounted higher than surrounding scatterers. Additionally, based on the results in fl6l . it has 
been argued that the channels to UEs are likely to share a partially common support in the angular domain, i.e., 
n£u-| supp(h|) = f l c . In order to utilize the channel sparsity and common support property simultaneously, channel 


2 The pilot sequences employed in the GWBE, WBE, and FOS schemes are respectively generalized Welch bound equality (GWBE) sequences, 
WBE sequences, and finite orthogonal sequences (FOS) whose correlation among sequences is either 1 or 0. The same downlink power allocation. 
Pi oc 7i/(i+7i), is used in the three schemes. 
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measurements acquired at UEs are fed back to the serving BS via error-free feedback channels. Hence, a joint 
channel recovery problem can be formulated as follows: 

min Etrlly^-S-M? 

{hfc.Vfe} (7) 

s.t. nf =1 supp(h£) = Q c . 

Using orthogonal matching pursuit (OMP) as a basis, a greedy algorithm has been proposed to efficiently solve 
this problem. The simulation results show that the required training overhead for this recovery algorithm can be 
significantly less than that for the conventional LS estimator. Moreover, the mean square error (MSE) performance 
improves with the increasing cardinality of tt c . 

One major concern about this joint recovery approach is the underlying assumption of perfect channel measure¬ 
ments being fed back. As practical feedback channels are rate-limited, it is more reasonable to assume quantized 
measurements at the BS. The impact of quantization on the channel recovery performance requires further investi¬ 
gation. On the other hand, it has been suggested that the amount of channel measurements that is needed at the BS 
should be adaptively adjusted according to the sensitivity of the system performance to the CSI inaccuracy OH. 
Furthermore, there has been little quantitative analysis of the required training overhead against the channel sparsity 
level. This quantification is in dire need as it will help us measure the actual training overhead reduction that can 
be achieved without relying on time-consuming simulations. 

2) The Weighted l\ Minimization Method: Considering a similar single-cell scenario, the study in ED has 
drawn attention to utilizing partial support information of sparse massive MIMO channels, which is a collection 
of indices of significant entries of channel vectors in the angular domain. The main advantage of using partial 
support information is the possibility of achieving a remarkable training overhead reduction. Specifically, the order 
of the required overhead decreases from O (slogM) to O (s) where s = card[supp(h^)] is the channel sparsity 
level. Assume that the partial support information 7/ ; . of channel hj' is available at UE k, where cardfY j ; ) = s and 
card[supp(hjj) IT Tk\ is given by |_asj. The higher the factor a, the higher is the accuracy level of partial support 
information. Based on a weighted i\ minimization framework, the channel recovery is performed as follows: 


min 

h|e€ M 

subject to 


with 


S D ' Uh- - y™ 


< e, 


1, i $■ T k , 


0, ! € T k , 


( 8 ) 


where S DL G C :V x A1 is designed to be a Gaussian random matrix of independent complex normal entries, the noise 
z£ L is assumed to be upper bounded, i.e., 11z^ L 11 2 < e, and ||h£||i, w = .j w-V | /ifc [*] | • In the objective function, 

the entries that are expected to be zero are weighted more heavily than others. The results show a significant 
improvement over the method without using partial support information when the accuracy level a exceeds a certain 
threshold. Moreover, taking a convex geometry approach, the authors have successfully and precisely quantified the 
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Fig. 3: Phase transition curves of ® over different values of a given M = 100, s = 10, z^ L = 0, and e = 0 (from 

m.) 


required training overhead for achieving a certain percentage of exact recovery. The exact recovery is declared if 
||h| — hjj | ) 2 < 10 1 . As shown in Fig. [3j the analytical curves of a = 0.2 and a = 0.8 can accurately depict the 
empirical phase transition curves of 60% exact recovery and 55% exact recovery, respectively. 

Unlike the previous method, here, channel measurements are not fed back to the BS. In other words, it avoids 
the assumption of error-free feedback channels. However, it raises another issue of storing random matrices at UEs 
with limited memory. Also, performing convex optimization can impose a stringent computation requirement on 
UEs without seeking for low-complexity solutions. Several attempts have been made to design practical training 
matrices. In ED, Toeplitz-structured training matrices, suggested for the realistic implementation, are shown to 
perform comparably to Gaussian random matrices and require generating less independent random variables. A 
deterministic approach to the training matrix design is first considered by appealing to matrix properties such 
as mutual coherence EOl . More advanced deterministic training matrices are developed in ED to yield higher 
recovery accuracy. In the context of FDD massive MIMO, it would be interesting to invent structurally random or 
deterministic training matrices that take partial support information of channels to multiple UEs into consideration. 
In addition, the similar concepts of using prior channel knowledge to lower training overhead can be found in [S4J 
where spatial and temporal correlations are harnessed. More study is needed to better understand how to integrate 
all the relevant prior knowledge into efficient CSI acquisition. 

B. TDD Massive MIMO 

As mentioned in Sec. Ill-BI employing uplink training to obtain high-dimensional downlink CSI results in undesired 
pilot contamination, and the following are some efforts to address this issue. 

1) The Coordinated MMSE Method: Contradicting conventional wisdom, it has been shown that it is possible to 
mitigate pilot contamination using the linear MMSE estimator (22l . The key factor in determining the success of 
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MMSE estimation is that each channel to the UE can be regarded as a linear combination of finite steering vectors 


1 X -v 

= T7p ai d> k W a (p)} i 

v r p=i 


(9) 


where P is the number of paths, oti j k (p) are zero-mean path gains, and a \Qi j k (p)] denote the steering vectors 
due to angle of arrivals (AoAs) Oij.k (p). Consequently, the rank of the channel covariance matrix R,jj. = 
, k } depends on the range [0"y r J,, 6™P£\ in which AoAs di t j t k (p) lie. which typically turns out to be 
low. Let us focus on the fcth row of (Q}, i.e., i +Zi,fc- Based on it, the desired channel can 

be further extracted by the MMSE estimator, i.e.. 


2 f L \ 

^■i,i,k = R i,i,k I &z^-M T ^ ' R i,l,k I (10) 

where the covariance matrix of z^ is assumed to be When the range of AoAs due to interfering UEs that use 

the same pilot sequence does not overlap with the AoA range due to the desired UE, the estimate h;,,^ approaches 
the desired as the BS antenna size grows to infinity. This feature is highly attractive because the dimension 

of the BS antennas can be made as large as desired in massive MIMO. Moreover, the condition of non-overlapping 
AoA ranges can be satisfied if the reused pilot sequence is properly allocated to UEs in neighboring cells. A heuristic 
algorithm has been developed to perform pilot allocation in a coordinated manner. Another favorable feature of 
this method recently demonstrated in Il23l is that the asymptotically optimal estimate is obtainable whether uniform 
or non-uniform arrays are employed. As a result, BS antenna arrays are exempt from the requirement of high 
calibration accuracy. 

The second-order statistics of high-dimensional channels have successfully been utilized to facilitate robust MMSE 
channel estimation under pilot contamination. However, obtaining channel covariance matrices of high dimension 
imposes another challenge to the massive MIMO system. It is interesting to know if the low-rankness can help speed 
up the acquisition of channel covariance matrices. Furthermore, it is still unknown if this covariance-matrix-aware 
method is sensitive to the inaccuracy of the second-order statistics. On the other hand, the information about AoAs 
actually can be extracted from statistical channel knowledge prior to commencing the instantaneous CSI acquisition 
ll24l . In this case, the dimension of the parameter space of each channel shrinks to P, which can be significantly 
less than the original. Most importantly, this information could aid BSs in distinguishing between training signals 
from UEs using the same pilot. 

2) The Quadratic Semidefinite Programming (SDP) Method: It is suggested that a BS should collect CSI of both 
the desired links within the cell and interference links from its neighboring cells ll25l . In other words, the CSI of 
interference links should not be regarded as irrelevant information. From this new angle, the expression (j3} can be 
recast as 


Yy 1 - = s UL Hi + zy 


(ii) 
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TABLE I: Comparison of Sparsity-Inspired CSI Acquisition Methods 


Methods 

Sparsity Types 

Pros 

Cons 




• Jointly exploit sparsity & 



Joint CSI Recovery 

(FDD) 

Sparse channel vectors & 

Common supports 

common-support property 

• Perform channel recovery at the 

BS 

• 

UEs need to feed back perfect 

channel measurements 

Weighted 

t\ Minimization 

(FDD) 

Sparse channel vectors & 

partial support 

information 

• Sharp estimate of the required 

training overhead 

• Lower training overhead 

• 

Need to obtain partial support 

information 

Coordinated MMSE 

(TDD) 

Low-rank channel 

covariance matrices 

• Performance improves with 

increasing antenna size 

• Lower training overhead 

• 

Need to obtain second-order 

channel statistics 

Quadratic SDP 

(TDD) 

Low-rank channel 

matrices 

• No need for knowledge of 

second-order channel statistics 

• 

• 

Only suitable for poor scattering 

propagation environments 

Higher training overhead 

Sparse Bayesian 

Learning (TDD) 

Sparse channel vectors in 

the UE domain 

• No need for knowledge of 

second-order channel statistics 

• 

• 

Channels are not jointly 

recovered 

Higher training overhead 


where S" L = [S^,..., S^ L ] and H, = [H ( l5 ..., TL L ] is the full CSI of wireless links that should be recovered. 
Thus, the currently challenging issue is similar to that in FDD massive MIMO, i.e., how to reduce the required 
training overhead. 

In the undesirable scattering propagation environments, the rank of the channel matrix is equal to the number 
r of the feasible AoAs ( P ) in ©, which is greatly less than max {M, K ■ L}. Based on this observation, a 

unclear norm regularized problem can be formulated as 

min i||ve C (Yf-)-$ve C (H i )||^+7l|H i || Ji , , (12) 

ri i 

where d/ = S UL 0 Im and 7 is a regularization factor. The sole purpose of adopting unclear norm regulation is to 
minimize the sum of the matrix’s singular values, thereby achieving rank minimization. The above problem has 
been further recast as a quadratic SDP problem 
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s.t. 


\v H V - 5ft 

i^KL 


{[vec(Y-)f v} 


vec 


VeC KL,M (*"v) 


H 


KL,M (^^v) 

7^M 


>- 0 . 


The solution v* to this SDP problem determines the estimate of the channel matrix 


(13) 


H* = vec ~ K \ M {'E' f [vec (Yf) - v*]} , (14) 

which can now be obtained efficiently, thanks to the readily available polynomial-time SDP solvers. 

In the commencing study of massive MIMO ll26ll . the CSI of interference links at BSs is viewed as nonessential. 
This is because that desired links and interference links are asymptotically orthogonal, and more importantly, 
intercell interference can be proved manageable with the CSI of desired links only. Here, we offer an explanation 
why there is a need for acquiring the CSI of interference links in the poor scattering environments. Consider that 
Hi = G,; A where A = [a ,..., a (<j> r )} is an r x M matrix of full row rank with r <C min {M, KL} due 
to poor scattering, and G, consists of KL x r independent and identically distributed (i.i.d.) zero-mean channel 
gains. Then, we have limM->oo AA ff = I, and 


lim H,Hf = G,Gf gt I K l (15) 

M—> oo 

which implies that the correlation among wireless links does not diminish with the increasing BS antenna size. In 
such a situation, it becomes crucial to obtain the full CSI of wireless links for effective interference management. 

3) The Sparse Bayesian Learning (SBL) Method: Sharing the same perspective as the study Il25ll . the work in l27l 
also considers acquiring the full CSI of wireless links and proposes a sparse Bayesian learning method to achieve 
this goal. Sparse Bayesian learning was first presented in Il28l and has been proved to outperform some prevailing l\ 
minimization algorithms |29). The SBL method proceeds by first transforming the channel matrix into the angular 
domain via DFT as mentioned in the joint CSI recovery method, i.e., H ? = H,U. Interestingly, instead of taking 
advantage of the sparsity in the angular domain, the sparsity in the UE domain, which has been empirically shown 
to exist, is utilized. In other words, the column vectors of the channel matrix H, are considered one by one. As 
each column vector consists of elements due to different UEs, the independence among elements can be reasonably 
assumed. This independence together with the sparsity in the UE domain leads to an effective Gaussian-mixture 
(GM) model which well describes the joint distributions of the channel elements. More surprisingly, empirical 
results show that there are only few parameters involved in the GM model that need to be determined. Therefore, 
the practical Bayes estimation can be implemented by evaluating marginal probability density functions via the 
approximate message passing (AMP) algorithm l30l and learning GM parameters by means of the expectation- 
maximization (EM) algorithm ED- The numerical results show that this Bayesian method can achieve a significant 
reduction in estimation errors. 
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The assumption of channel vectors being sparse in the UE domain may not hold when the UE dimension KL 
is not large enough. A possible remedy for this situation is suggested in the following. First, it is desirable to 
understand if the GM model is also applicable for modeling distributions of spare channel vectors in the angular 
domain. Second, as angular-domain channels are very likely to consist of a small number of block-wise non-zero 
segments resulting from few clusters of scatterers, it is eminently reasonable to assume some dependence among 
angular-domain channel elements. Hence, the distribution of the channel vector could be a mixture of Gaussian 
random vectors, and the original AMP and EM algorithms should be modified accordingly to this new GM model. 

C. Discussion and Comparison 

In the previous subsections, several methods for efficient high-dimensional CSI acquisition have been discussed 
for massive MIMO communications. Table Q] provides a brief summary of the advantages and disadvantages of 
these methods. It is shown in the table that each method utilizes a distinct sparsity structure. However, all sparsity 
structures considered in massive MIMO are based on the observation that angular-domain channels are sparse. As a 
result, the second-order statistics of massive MIMO channels inherit the sparsity structure, yielding low-rank channel 
covariance matrices. In addition, as sparse channels are collectively examined, it leads to either block-sparse or 
low-rank channel matrices. When the UE dimension is comparable to the channel dimension, sparsity in the angular 
domain also results in sparsity in the UE domain. On the basis of the aforementioned sparsity structures, different 
sparsity-inspired methods are developed either to reduce training overhead or to mitigate pilot contamination. 

In FDD massive MIMO, without feeding back channel measurements to the BS side, less sparsity structures are 
available for developing efficient CSI acquisition methods. Despite this limitation, the weighted minimization 
method shows that achieving further overhead reduction is feasible if partial support information can be obtained in 
advance and properly harnessed. Interestingly, by enabling the BS to gather perfect channel measurements from its 
served UEs, the joint CSI recovery method offers an effective way of utilizing sparsity structures across multiple 
UEs. If the performance superiority of this method still holds when taking rate-limited feedback channels into 
account, it will establish the fact that offloading CSI acquisition tasks to the BS is feasible and beneficial. 

With regard to TDD massive MIMO, uplink training has more sparsity structures to utilize as high-dimensional 
channels are jointly recovered at the BS side. It is worth noting that only low-rank channel covariance matrices 
have been used for pilot decontamination. Other sparsity structures such as low-rank channel matrices and sparse 
UE-domain channels have not been considered for mitigating the effects of pilot reuse. In this regard, there is 
still much room for innovation in sparsity-inspired pilot decontamination. It is also worth noting that using perfect 
covariance matrices of both desired channels and interference channels in the coordinated MMSE method has drawn 
criticism (32] ■ It would be intriguing to assess if there exist efficient algorithms for learning low-rank covariance 
matrices. If such algorithms are developed or identified, they should be integrated into the coordinated MMSE 
method. 
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D. Implementation Issues 

Recently investigators have examined the practical implementation of compressed sensing based algorithms for 
sparse channel recovery t33l - l35l . Although the design targets are channel models in the 3GPP LTE standard, 
several insights that have been provided are still valuable and applicable to realistic implementation of sparse 
massive MIMO channel recovery. It has been pointed out that greedy algorithms such as OMP or matching pursuit 
(MP) are more desirable from a hardware perspective. It is because these algorithms require lower computational 
complexity and lower numerical precision when compared to convex relaxation algorithms such as basis pursuit 
(BP) l34l . The trade-off between hardware complexity and denoising performance of three greedy algorithms has 
been characterized in 031 and it is indicated that the chip area overhead required to implement the gradient pursuit 
(GP) algorithm can be three times larger than MP. The power consumption is normally proportional to this area 
overhead. When it comes to the design of channel recovery algorithms in FDD massive MIMO, which are typically 
performed at the UE side, the issue of hardware complexity should be carefully taken into account. On the other 
hand, at the BS side, high-dimensional channels can be recovered by more advanced algorithms such as sparse 
Bayesian learning or joint CSI recovery. 

E. Implications of New Propagation Models 

Most existing studies have based their CSI acquisition approaches on the conventional MIMO channel models, 
which may fail to capture some unique characteristics of massive MIMO channels. For instance, the far-held and 
plane wavefront assumptions no longer hold when antenna arrays become physically larger than the Rayleigh 
distance lf36l . On the other hand, the sheer size of antenna arrays, where different antenna elements observe 
varying subsets of scatterer clusters, makes the assumption of spatial channels being wide-sense stationary on the 
array axis no longer valid J37). While new channel models have been proposed in f38l . If39l by making a more 
accurate spherical wavefront assumption and taking the non-stationarities into consideration, there is still very little 
understanding of how these characteristics affect the sparsity structures of the channels in massive MIMO systems. 
One previous result Pol , however, suggests that the spherical wavefront model does adequately characterize the 
rank of the channel matrix. This implies that the new channel models can potentially affect the SDP method which 
exploits the sparsity in the form of the channel matrix rank. In addition, the possibility that none of clusters are 
perceptible to some antenna elements cannot be categorically excluded, so it indicates the possible presence of the 
sparsity on the array axis. These inferences suggest that there is abundant room for further progress in identifying 
utilizable sparsity structures based on the latest models. 

IV. Conclusions 

In this article, the challenges of acquiring high-dimensional CSI in FDD/TDD massive MIMO systems have been 
discussed. To address these challenges and break the curse of dimensionality, one can effectively utilize sparsity 
structures that uniquely appear in massive MIMO channels. Several state-of-the-art sparsity-inspired approaches 
for high-dimensional CSI acquisition have been examined and compared in terms of the sparsity structures being 
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exploited, while their own advantages and disadvantages are identified. As a result of this study, the following 
conclusions can be drawn. The sparsity structures that can be harnessed are conditional on the radio propagation 
environments. In TDD massive MIMO, uplink training inherently has more sparsity structures to exploit as high¬ 
dimensional channels are jointly recovered at the BS. On the contrary, in the FDD mode, the desired channel is 
normally recovered at the UE where utilizable sparsity structures are limited. Finally, based upon existing approaches, 
we have identified the potential research problems in need of further investigation. 
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